# Design Of Area Efficient And Low Power 4-Bit Multiplier Based On Full-swing GDI technique

Omnia Ali Albadry<sup>1</sup>, M. A. Mohamed El-Bendary<sup>2</sup>, Fathy Z. Amer<sup>3</sup>, Said M. Singy<sup>4</sup>

<sup>1</sup>Department of Electronics, Faculty of Industrial Education, Sohag University, Sohag, Egypt

<sup>2</sup>Department of Electronics Technology, Faculty of Industrial Education, Helwan University, Cairo, Egypt

<sup>3</sup>Department of Electronics and Communications, Faculty of Engineering, Helwan University, Cairo, Egypt

<sup>4</sup>Dept. of Curriculum, Teaching Methods and Educational Technology, Faculty of Education, Banha University, Qalyubia, Egypt

Omnia.albadry@yahoo.com, dr m.kassem@yahoo.com, dr fathy@hotmail.com, sayedsingy@yahoo.com

Abstract—This paper presents a design of 4-bit multiplier using full adder cell based on full swing gate diffusion input technique. The proposed adder design consists of 18 transistors and compared with different logic styles for full adders through cadence virtuoso simulation based on TSMC 65nm models at a supply voltage of 1v and frequency 250MHz. The simulation results showed that the proposed full adder design dissipates low power while improving the area and provides full swing output voltage among all the designs taken for comparison. The proposed full adder used to design Array, Barun and Baugh Wooley multipliers, Energy and Transistor count of these multipliers improved compared to CMOS.

Keywords— FS XOR-XNOR; FS-GDI; Full Adder; Multiplier; MUX; GDI.

#### I. Introduction

Due to the heavy interest in usage of digital integrated circuits for portable devices such as cellular communications, phones, battery, laptops and personal digital assistant (PDAs), etc., the need for small chip circuits, power consumption and speed are vital factors should be taken into consideration while choosing the VLSI design with high performance.

An addition is a basic arithmetic operation heavily demanded in VLSI design such as multiplier and accumulator (MAC), microprocessor, digital signal processing applications, so the system performance will be affected by the performance of full adder. A full adder is essential in arithmetic operation such as division, subtraction addition, and multiplication. Enhancing energy will influence the whole system [1] so energy must be improved. This can be achieved by GDI technique.

The aim of this work is to design 4-bit multiplier using a full adder circuit based on full-swing GDI to reduce power consumption, delay and area, in addition, to achieve full-swing output with high performance.

This paper is organized as follows: Section II overviews the GDI methodology and presents its benefits and limitations. The different type of multiplier presents in Section III. Section IV discusses the design of Full Adder Cell presents. Section V presents simulation results and comparison. Section VI concludes the paper.

## II. GATE DIFFUSION INPUT TECHNIQUE

A. Morgenshtein, A. Fish and A. Wagner [2] present new low power GDI technique and small silicon area of VLSI digital circuit as an alternative to complementary metal oxide semiconductor (CMOS) logic design. As illustrated in fig. 1 (a) primitive GDI cell.



Fig. 1. GDI cells; (a) Original GDI, (b) MOD-GDI

This technique consumes a small silicon area, it can achieve complex function using only two transistors as listed in table I, However, this technique was suggested for manufacturing in twin-well CMOS process and silicon on insulator. It improved power consumption and propagation delay.

TABLE I. DIFFERENT LOGIC FUNCTIONS REALIZATION USING GDI CELL.

| N | P | G | OUT                               | Function |
|---|---|---|-----------------------------------|----------|
| 0 | В | Α | $\overline{A}$ B                  | F1       |
| В | 1 | Α | $\overline{A}$ +B                 | F2       |
| 1 | В | Α | A+B                               | OR       |
| В | 0 | Α | AB                                | AND      |
| C | В | Α | $\overline{A}B+AC$                | MUX      |
| 0 | 1 | Α | $\overline{A}$                    | NOT      |
| B | В | Α | $\overline{A}B+A\overline{B}$     | XOR      |
| В | B | Α | $\overline{A}  \overline{B} + AB$ | XNOR     |

Unfortunately, this logic style suffered from some limitation such as non-full swing output voltage due to threshold drop which means that output either high or low deviate from VDD or GND by the threshold voltage for PMOS or NMOS.

Morgenstein suggested modified GDI technique [3] whereas the cell resembles the primitive cell of GDI. Modified GDI differs from primitive GDI by important difference, bulk terminals of PMOS and NMOS connected with VDD and GND, respectively, as shown in fig. 1 (b).

This logic style is suitable for fabrication in a standard CMOS process; as well realize improvement in output voltage, power and power delay product compared to basic GDI logic. Although the threshold drop problem, not fully resolved, and the output voltage still degrades.



Fig. 2. FS GDI cell (a) F2, (b) F1

The threshold drop problem was solved and the output swings degradation, improved by using the full swing GDI technique [4]. This new approach for design utilizes only swings restoration transistor (SR) to produce full swing operation for F1 and F2 functions as shown in fig. 2 (a) and (b). One of two functions F1 or F2 or a combination of both can be used to realize many logical functions. This approach utilizes more transistors than standard GDI, however, as compared to complementary metal oxide semiconductor (CMOS), swing restoration complementary pass transistor logic (SR-CPL), double pass transistor logic (DPL) and Hybrid CMOS logic style, it utilizes a fewer number of transistors and achieves full swing output, consumes low power, energy efficient and smaller area.

In this paper, we focused on designing 1-bit full adder in transistor level with low power and energy optimization.

## III. MULTIPLIER

The multiplier is heavily demanded in a microprocessor, digital signal processing to perform the high computational operation in image processing and video coding and computer sector like wireless code division multiple access (WCDMA), carrier synchronizers and orthogonal frequency division multiplexing (OFDM) based wireless devices.

There are many architectures of multiplier using different algorithms to perform a multiplication operation [12]. There are three steps to realize multiplying operation: partial product generation, partial product addition, and the final adding process. The basic components of multipliers are AND gates, Full Adders, and Half Adders. To improve the performance of the system these circuits should be optimized. The different multipliers are Array, Barun and Baugh Wooley multiplier.

#### A. Array Multiplier

Array multiplier is the simplest structure of parallel multiplier. This multiplier using the standard add and shift operation based on 'add and shift' algorithms to perform a multiplication operation. The structure of 4-bit array multiplier is presented in fig. 3. The partial products generator consists of n number of 'AND' gates to multiply the multiplicand with each bit of the multiplier and then

these partial products are shifted depending on their order and this summation operation can be performed by using full adder and a half adder. In 4x4 array multiplier, 4x4 AND gates used to generate partial products and 4x (4-2) full adders and 4 half adders used to generate.



Fig. 3. 4x4 Array Multiplier

## B. Barun Multiplier

Barun multiplier is a linear multiplier, which has a regular structure and known as carry-save array multiplier. This multiplier operate based on the fact that not add immediately "the carry bits" that are outputs of the first stage but are saved for the next addition stage. As shown in fig. 4. 4x4 Braun multiplier, which consists of (4-1) rows of carry-save adders (CSAs) and a (4-1) bit ripple-carry adder in the last row and each row contains (4-1) full adder (FAs).

The main advantage of Barun multiplier that it has only one critical path rather than many paths found in the array multiplier and this is the most widely used in DSP applications due to consuming low power.



Fig. 4. 4x4 Barun Multiplier

# C. Baugh Wooley Multiplier

A Baugh Wooley multiplier based on parallel array architecture. This multiplier is used for both unsigned and signed number multiplication. Signed number operands which are represented in 2's complemented form to make sure that the signs of all partial products are positive. The 4x4 Baugh Wooley multiplier is shown in fig. 5.



FIG. 5. 4X4 BAUGH WOOLEY MULTIPLIER

### IV. FULL ADDER

A full adder is a combinational circuit that performs many arithmetic operations of 3 logic bits. It is consists of 3 blocks (XOR, XOR, and MUX) the block diagram of full adder shown in fig. 6. According to Previous studies, the best implementation for Block 1 was discussed in [6]. block 1 was built by logical design: a XNOR-XOR gate to obtain expressions  $(\overline{A \oplus B})$  and  $A \oplus B$ . The total power consumption and propagation delay of the full adder circuit affected by the delay and the voltage swing of the XNOR-XOR signal and its complement created within the cell.



Fig. 6. Block Diagram of Proposed Full Adder

### A. XOR Gate

XOR gate is the basic building block for the realization of various digital circuits such as a multiplier, comparator, adder, decoder, and compressor. The expression of the XOR function presented in equation 1.

$$A \oplus B = \overline{A}B + A\overline{B} \tag{1}$$

Reference [6] proposed a low power XOR and XNOR (LP XOR-XNOR) using 4 transistors, Which consumes low power, but suffers from non-full swing outputs voltage at XOR logic when input signal AB=00 all PMOS transistor switched on and passes weak logic 0, and at XNOR gate all NMOS transistor switched on and passes weak 1 and no drive capability, Ming Wang enhanced that by using CMOS inverter [7]. Transmission gate logic style proposed to design low power [8], high-speed module and small area but has degradation in output voltage.

Chowdhury proposed XOR gate using 3 transistors [9], including some limitation like degradation in output, low speed, and no drive capability. Morgenstein proposed XOR based on the GDI logic style, but the output degradation made it unusable for a system consists of more stages connected straight and solved this problem by using MUX to generate XOR function [4], however, it consumes more area as 8 transistors and power. XOR function with low energy, full swing output, and high-speed design proposed using modified GDI technique and swing restoration transistor so this module contains 6 transistors [10] as shown in fig. 7 (a) and XNOR cell in fig. 7 (b).



Fig. 7. GDI cell; (a) 6T-XNOR Gate, (b) 6T-XOR Gate

# B. Multiplexer (Module3)

The basic multiplexer (MUX) has a number of input lines and one output line. A multiplexer chooses the output from inputs based on a select signal [11]. The circuit in figure .6 presents GDI MUX uses only 2 transistors, but the main disadvantage it generates non-full swing output. Morgenstein solved this problem by using two cells of GDI MUX [4], whereas the function as presented in expression (2) achieved and full swing output obtained as shown in fig.  $8.2\times1$  Multiplexer consists of 6 transistors.

$$MUX = \bar{S}A + SC \qquad (2)$$



Fig. 8. FS-GDI MUX

### C. Design Of Full Adder

The proposed design consists of 18 transistors to implement 1-bit full adder as shown in fig. 9. The first function summation of 3 inputs A, B, and the carry out from the previous stage called  $C_{in}$  implemented by module 1 using XNOR-XOR and module2 using XOR.

Module 1 has an inverter to generate the function  $\overline{B}$  through 2 transistors Mp1 and Mn1, while Mn2 and Mp2 implement  $A \oplus B$  and full swing output achieved by swing restoration transistors Mn3 and Mp3. XNOR Function generated by transistors Mp4 and Mn4 and full swing output achieved by swing restoration transistors Mn5 and Mp5 instead of an inverter that used in ref [10] to generate XNOR function which consumes more power, this function drive the second module in the circuit. Module 2 generate the final output of the summation function as presented in equation (3).

$$SUM = A \oplus B \oplus C_{in}$$
 (3)

$$C_{out} = A \overline{(A \oplus B)} + C_{in} (A \oplus B)$$
 (4)

Carry out generated by module 3 which consists of 4 transistors. When  $(A \oplus B)$  equals 1 then Cin passes through Mn9 and Mp8 to carry out, but if it equals—0, input A passes through Mp9 and Mn8 to carry out to realize the carry resulting of summation as presented in expression (4). The truth table of the 1-bit full adder is given in Table II.

TABLE II. TRUTH TABLE OF PROPOSED FULL ADDER

| A | В | Cin | SUM | $C_{out}$ |
|---|---|-----|-----|-----------|
| 0 | 0 | 0   | 0   | 0         |
| 0 | 0 | 1   | 1   | 0         |
| 0 | 1 | 0   | 1   | 0         |
| 0 | 1 | 1   | 0   | 1         |
| 1 | 0 | 0   | 1   | 0         |
| 1 | 0 | 1   | 0   | 1         |
| 1 | 1 | 0   | 0   | 1         |
| 1 | 1 | 1   | 1   | 1         |



Fig. 9. Proposed design for 1-Bit Full Adder

### V. SIMULATION RESULTS AND COMPARISON

The Simulation of the proposed 1-bit Full Adder design, alongside with C-CMOS, CPL, DPL, TG, Hybrid, Hybrid CMOS, Mirror, SR-CPL, and GDI designs was carried out using TSMC 65nm technology process. Inputs (A, B.  $C_{in}$ ) are loaded with buffers before they are fed to the adder cell, outputs Sum and  $C_{out}$  also loaded with buffers [13]. The XOR gate and different type of multiplier also tested using the same method. This simulation setup is shown in fig. 10.



Fig .10. Test bench simulation

It provides a similar situation to realistic conditions where the cell has both a driving circuit and a driven circuit. Simulations were done using The Spectre based Cadence Virtuoso simulator with a power supply of 1V and frequency 250MHz, the size of PMOS is twice the NMOS transistor size Wp/L=240/60, Wn/L=120/60 (PMOS and NMOS) respectively to achieve the best power and delay performance.

Figures (11) and (12) shows the waveform of non-full swing output and full swing GDI XOR gate, respectively, the results of FS-GDI XOR compared with different logic styles shown in Table IV, Energy is less by 56% the SR-CPL. The results of the proposed full adder compared with the previous logic styles utilized in the internal structure of full adder in terms of power, delay and energy (PDP) as shown in Table V.



Fig. 11. The output waveform of non-full swing XOR Gate



Fig. 12. The output waveform of full swing-GDI XOR Gate

The proposed adder circuit has achieved 20.6% and 70.7% reduction in energy (PDP) as compare to TG (18T) and GDI full adder design, respectively. The propagation delay of the SR-CPL adder is 8% less than the proposed design; however, power consumption is about 10.7% times larger than the proposed design. The waveform of the proposed Full Adder shown in fig. 13.



Fig. 13. Output Waveform of proposed full adder

Using A=1111 and B=0111 it can be observed from table VI the multiplier has achieved significant merits in terms of power, delay, enrgy and transistor count compared to CMOS and GDI designs.

The simulation results of different types of multiplier designed using 18T full adder based on full swing GDI

energy present improvement in array multiplier by 35%, improvement in Barun multiplier by 34%, and improvement in Baugh Wooley by 32% as compared to CMOS logic style. The transistor count decreased by 35% for array and Barun multiplier, and by 37% for Baugh Wooley multiplier. The energy of GDI based array, Barun and Baugh Wooley multiplier designs are significantly higher than a full swing GDI based adder designs. As compared to conventional GDI energy present improvement in array multiplier by 59.6%, improvement in Barun multiplier by 51.2% and improvement in Baugh Wooley by 66%. Minimum energy is achieved in the proposed multiplier designs with 18T full adder based on full swing GDI gates.

TABLE IV SIMULATION RESULTS OF DIFFERENT XOR MODULE IN TSMC 65 NM CMOS PROCESS TECHNOLOGY AT 250 MHz, VDD 1v.

| Design          | Power (uW) | Delay<br>(pS) | PDP<br>(e-<br>18J) | No.of<br>Transisto<br>rs | OUTPUT            |
|-----------------|------------|---------------|--------------------|--------------------------|-------------------|
| LP[6]           | 1.9        | 26.7          | 50.7               | 4T                       | Not Full<br>Swing |
| GDI[3]          | 2.4        | 38.8          | 93                 | 4T                       | Not Full<br>Swing |
| MUX-<br>GDI [4] | 2.3        | 30.3          | 69.7               | 8T                       | Full Swing        |
| SR-<br>CPL[14]  | 3.1        | 37.9          | 117.<br>49         | 10T                      | Full Swing        |
| FS-GDI<br>[9]   | 2          | 25.85         | 51.7               | 6t                       | Full Swing        |

TABLE V. SIMULATION RESULTS OF DIFFERENT FULL ADDER CIRCUITS IN TSMC 65 NM CMOS PROCESS TECHNOLOGY AT 250 MHZ, VDD 1V.

| Design             | Power | Delay | PDP     | No.of       |
|--------------------|-------|-------|---------|-------------|
| Design             | (uW)  | (pS)  | (e-18J) | Transistors |
| CPL[14]            | 6     | 63.8  | 382     | 38          |
| C-CMOS [15]        | 4.1   | 65.75 | 269.5   | 28          |
| (TG) [7]           | 4.4   | 40    | 189     | 18          |
| Mirror [16]        | 4.8   | 39    | 187.2   | 28          |
| GDI[11]            | 6.4   | 80    | 512     | 18          |
| SR-CPL[6]          | 4.43  | 31    | 137.33  | 26          |
| DPL[6]             | 5.3   | 35    | 172.25  | 28          |
| (Hybrid)[16]       | 4.2   | 43    | 180.6   | 18          |
| TG+GDI[17]         | 4.3   | 48.6  | 209     | 21          |
| (Hybrid-CMOS) [18] | 4.2   | 84.5  | 203.7   | 24          |
| PROPOSED           | 4     | 37.5  | 150     | 18          |

TABLE VI SIMULATION RESULTS OF DIFFERENT MULTIPLIER ARCHITECTURE IN TSMC 65 NM CMOS PROCESS TECHNOLOGY AT 250 MHz, Vdd 1v.

| Design                        | Power (uW) | Delay<br>(pS) | PDP<br>(e-18J) | No.of<br>Transistors |
|-------------------------------|------------|---------------|----------------|----------------------|
| Array Multiplier-CMOS         | 53.8       | 146           | 7854.8         | 400                  |
| Array Multiplier-GDI          | 109        | 159           | 17331          | 296                  |
| Array Multiplier-FSGDI        | 44         | 133           | 5852           | 260                  |
| Barun Multiplier-CMOS         | 48.5       | 133.5         | 6474.7         | 400                  |
| Barun Multiplier-GDI          | 78.9       | 189           | 14912          | 296                  |
| Barun Multiplier-FSGDI        | 38.5       | 230           | 8855           | 260                  |
| Baugh Wooley Multiplier-CMOS  | 55         | 160           | 8800           | 532                  |
| Baugh Wooley Multiplier-GDI   | 121        | 187           | 22627          | 382                  |
| Baugh Wooley Multiplier-FSGDI | 47         | 145           | 6815           | 366                  |

### VI. CONCLUSION

This work presents an 18T Full Adder designed in 65nm TSMC process using the Full-Swing GDI technique and simulated using the Cadence Virtuoso simulator. The computational Simulation results showed improvement in terms of power consumption, delay and transistor count while maintaining Full-Swing operation as compared to other approaches. It has been shown 35% improvement in energy for the proposed array multiplier, 34% improvement in energy for the proposed barn multiplier and 32% improvement in energy for the proposed Baugh Wooley multiplier As compared to CMOS and reports better results than GDI multipliers in terms of power, delay, and energy. In future, this work will suitable to design filters for DSP applications.

#### VII. REFERENCES

- N. Weste, D. Harris, CMOS VLSI Design a Circuits and Systems Perspective, 4thEd, Addison-Wesley, 2011.
- [2] A. Morgenshtein, A. Fish, and I. Wagner, "Gate-diffusion input (GDI): a power-efficient method for digital combinatorial circuits," IEEE Transactions on Very Large Scale Integration (VLSI) Systems IEEE Trans. VLSI Syst., vol. 10, no. 5, pp. 566–581, 2002.
- [3] A. Morgenshtein, I. Schwartz, and A. Fish, "Gate Diffusion Input (GDI) logic in standard CMOS Nanoscale process," 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, 2011.
- [4] A. Morgenshtein, V. Yuzhaninov, A. Kovshilovsky, and A. Fish, "Full-swing gate diffusion input logic—Case-study of low-power CLA adder design," Integration, the VLSI Journal, vol. 47, no. 1, pp. 62–70, 2014.
- [5] Mansi Jhamb, Garima and Himanshu Lohani,"Design, implementation and performance comparison of multiplier topologies in power-delay space", Engineering Science and Technology, an International Journal, 2015.
- [6] Mariano Aguirre-Hernandez and Monico Linares-Aranda," CMOS Full-Adders for Energy-Efficient Arithmetic Applications", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 4, APRIL 2011, 2011.
- [7] Jyh-Ming Wang, Sung-Chuan Fang and Wu-Shiung Feng," New Efficient Designs for XOR and XNOR Functions on the Transistor Level", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 29, NO. 7, JULY 1994, 1994.
- [8] A. Shams, T. Darwish, and M. Bayoumi, "Performance analysis Of low-power I-bit CMOS full adder cells," IEEE Trans. on VLSI Syst. vol IO, no. 1,pp.20-29, 2002.

- [9] Chowdhury, S.R., Banerjee, A., Roy, A. and Saha, H;" A high speed 8 transistor full adder design using novel 3 transistor XOR gates", International Journal of Electronics and Communication EngineeringVol.2, No:10, 2008.
- [10] Mohan Shoba and Rangaswamy Nakkeeran, "GDI based full adders for energy efficient arithmetic applications", Engineering Science and Technology, an International Journal, 2016.
- [11] John B. Gosling, Design of Arithmetic Units for Digital Computers, 1980
- [12] Mohan Shoba and Rangaswamy Nakkeeran," Performance Analysis of 1bit Full Adder Using GDI Logic", ICICES2014 - S.A.Engineering College, Chennai, Tamil Nadu, India, 2014.
- [13] E. Abu-Shama and M. Bayoumi, "A new cell for low-power adders," in Proc. Int. Midwest Symp. Circuits Syst., 1995.
- [14] T. Bhagyalaxmi, S. Rajendra, S. Srinivas," Power-Aware Alternative Adder Cell Structure Using Swing Restored Complementary Pass-Transistor Logic at 45nm Technology", 2nd International Conference on Nanomaterials and Technologies (CNT 2014), 2014.
- [15] M. Alioto, G. Di Cataldo, and G. Palumbo," Mixed Full Adder topologies for high-performance low-power arithmetic circuits", Microelectronics Journal 38 (2007) 130–139,2007.
- [16] Korra Ravi Kumar, P.Mahipl Reddy, M.Sadanandam, Santhosh Kumar.A, and M.RAJU,"Design of 2T XOR Gate Based Full Adder Using GDI Technique", International Conference on Innovative Mechanisms for Industry Applications (ICIMIA 2017), 2017.
- [17] Nehru Kandasamy, Firdous Ahmad, Shasikanth Reddy, Ramesh Babu M, Nagarjuna Telagam and Somanaidu Utlapalli," Performance Analysis of 4-Bit MAC Unit using Hybrid GDI &Transmission Gate based Adder and Multiplier Circuits in 180 nm & 90 nm Technology", Microprocessors and Microsystems Journal, 2018.
- [18] Chiou-Kou Tung, Yu-Cherng Hung, Shao-Hui Shieh, and Guo-Shing Huang," A Low-Power High-Speed Hybrid CMOS Full Adder for Embedded System", 2007 IEEE Design and Diagnostics of Electronic Circuits and Systems conference, 2007.
- [19] Mohan Shoba and Rangaswamy Nakkeeran," GDI based full adders for energy efficient arithmetic applications", Engineering Science and Technology, an International Journal, 2015.